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ABSTRACT 


The primary aims of this study were two-fold: a) to describe average change in the written 
narrative performance of second grade students from the fall and spring of the school year and b) 
examine patterns of individual growth to test for Matthew effects. Participants included 299 
children in second grade. Microstructural measures were derived from students’ written 
narratives including: number of different words (NDW), total number of words (TNW), and 
accuracy of spelling and grammar. Significant increases in NDW, TNW, and spelling accuracy 
were evidenced from fall to spring. Students averaged 55 total words in the fall and averaged 69 
words in the spring, with a statistically significant increase of 14 words 1(299)=8.4, p<.0001). 
The variance in TNW from fall to spring increased from Var=791 to Var=1005, which was a 
significant increase and the correlation of initial Fall TNW and growth in TNW was also 
significant (r = 0.39). Additionally, results from a two-level hierarchical linear model with 
students nested within teachers indicated that initial level of TNW predicted the change in TNW 
from fall to spring, with higher levels of initial TNW being related to larger gains in TNW. 
Significant predictors of Matthew effects included teacher or classroom and free/reduced lunch 
eligibility. Written personal narrative measures are sensitive to developmental change across a 
school year. Evidence of Matthew effects in lexical productivity suggests additional support may 


be warranted to ameliorate gaps in writing achievement. 
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Matthew Effects in Writing Productivity during Second Grade 


Written language is well recognized as an essential skill for academic success and 
performance on high stakes tests (Jenkins, Johnson & Hileman, 2004). Notably, students begin 
on the journey of learning to write in preschool and early elementary grades, with rapid growth 
expected in the early grades as students are expected to convey ideas clearly through writing in a 
relatively few number of years of formal instruction in writing. The development of students’ 
writing skills has gained increased attention globally in the last decade. Research on writing has 
gained momentum across the globe as researchers study ways to cultivate writing skills 
(Camache & Alves, 2017) and attempt to model relationships between writing and other 
cognitive-linguistic skills for speakers from a multitude of language backgrounds (e.g., Harrison, 
Goegan, Jalbert, McManus, Sinclair & Sparling, 2016; Kim & Park, 2019; Yeung, Suk-han, 


Wai-ock, & Kein-hoa, 2013). 


Among the many motivations to study writing development, are observed gaps and 
underachievement in students’ writing achievement during the elementary school years. As one 
example, in the United States there has been increased awareness of the frequent failure of 
students to reach proficiency by fourth grade (National Center for Education Statistics, NCES, 
2012). According to the Nation’s Report Card (2012), only one-quarter of students perform at 
the proficient level in writing in fourth grade making it increasingly important to monitor writing 
development earlier and often. Additionally, in the United States, national statistics show the 
risk of underachievement in writing is even greater for specific demographic groups (e.g., 
students eligible for free and reduced lunch, students of from linguistically diverse backgrounds, 
and students living in rural areas) (NCES, 2012) adding to the need to monitor writing skill 


development early and frequently for students from disadvantaged backgrounds. 
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The risk for failure to attain writing proficiency appears to be disproportionally greater 
for students who enter school with low general language skills (NCES, 2012). This phenomenon 
has been examined in other domains and described as the Matthew effect, in which the rich get 
richer while the poor get poorer (Cook & Campbell, 1979; Merton, 1995; Rigney 2010). As 
Walberg and Tsai explained, “those who score higher than others on pretests...at the beginning 
of an experiment gain absolutely and relatively more than others from the same experience” 
(1983) (p. 360). The term captures the notion that initial advantage tends to foster further 
advantage. Further, for students with weaker language skills in the early grades, the 
disadvantage is associated with increasing disadvantage with widening achievement gaps over 
time. 

Matthew effects have been previously studied in relation to students’ reading 
achievement with inconsistent findings (e.g., Pfost, Hattie, Dorfler & Artelt, 2014). Pfost and 
colleagues (2014) provided a summary of empirical results of Matthew effects in reading after 
reviewing 28 articles including 78 separate results on inter-individual differences in reading. 
Among inclusion criteria, was the report of a covariance or correlation between baseline level 
and a growth component. Of the 78 results, 42% demonstrated decreasing gaps in reading 
achievement; 26% reflected stable gaps and 23% showed increasing achievement gaps. The 
authors identified challenges to detecting Matthew effects including ceiling effects on 
standardized measures, lack of precision in measures, and a low number of available studies. 

Matthew effects have not been widely studied in relation to writing skills. The notion of 
Matthew effects can be viewed in relation to Merton’s social theory of opportunity structure 
(Merton, 1995). Merton describes opportunity structures as “the scale and distribution of 


conditions that provide various probabilities for acting individuals and groups to achieve 
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specifiable outcomes” (p. 25). In short, initial advantages result in subsequent advantages. 
Applying this theory, students with better initial language skills would be expected to experience 
greater benefits from opportunities to attain writing skills, creating a fan spread growth or 
widening gap over time in which the gap widens between students at the low and high range of 
performance over time. Conversely, it is possible that students with low initial language skills 
catch up and close the performance gap during the school year. It is also possible that the slopes 
of change stay relatively stable and the slopes of change between low and high performers are 
relatively parallel to each other across the school year. 
Theoretical Framework 

The lexical quality hypothesis (Perfetti & Hart, 2002) provides additional support to 
suspect the presence of Matthew effects in lexical productivity. Based on this hypothesis, 
children vary in the quality of their lexical representations. Children with high quality 
representations demonstrate more complete semantic information, phonetic information, and 
fully specified orthographic representations (i.e., spelling). In contrast, children with low quality 
lexical representations may know a meaning but not readily retrieve the corresponding word or 
have incomplete orthographic representations of the word. The lexical quality hypothesis is 
generally applied to reading, in which less skilled readers have fewer high quality representations 
(Perfetti & Hart, 2002). Applying the hypothesis to writing, children with low quality 
representations may demonstrate less breadth and depth of vocabulary in their writing and may 
also get stuck when attempting to spell a word due to weak orthographic representations. 

Although the lexical quality hypothesis is generally applied to reading, its relevance to 
lexical productivity in writing is based on the intertwined skills involved across the modalities of 


reading, writing, and oral language. It is generally accepted that oral language provides support 
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for writing (e.g., Kim & Schatschneider, 2017). The connectedness of reading, writing, and oral 
language is theoretically grounded in theories that emphasize the common underlying constructs 
or constellations of knowledge shared by oral language, reading, and writing (Fitzgerald & 


Shanahan, 2010; Shanahan, 2006). 


Applying the theoretical perspective that writing is the encoding of oral language 
(Berninger & Amtmann, 2003; Kim & Schatschneider, 2016), writing has been viewed as 
componential skills of transcription, or the written production of letters and words (Singer & 
Bashir, 2004), and composing or generating ideas (Kim, 2016; Kim & Schatschneider, 2016; 
Kim, Park, & Park, 2015). From this perspective, writing and oral language skills are integrally 
intertwined, as writing relies on interconnected language skills including word knowledge. As 
such, we suspect that differences in quality of lexical representations may be detectable in early 
writing performance. This notion is supported by the fact that measures of writing have also been 
studied relative to the overlap in knowledge area with oral language and reading (Fitzgerald & 
Shanahan, 2000). 


Microstructural Measures of Writing 


Observable developmental change in writing during the early grades may be influenced 
by which aspects of writing are considered and how such components are measured. 
Dimensions of writing vary across studies. In a study of 186 first and fourth grade students by 
Wagner et al. (2011), results of a confirmatory factor analysis supported four composition factors 
including: macro-organization, complexity, productivity, and mechanical errors. Other studies 
have focused on the linear process of planning, writing, and revising (e.g. Goertz, Duffy, & 
LeFloch, 2001) and included measures of productivity, complexity, accuracy, and mechanics 


within such phases of the writing process (Koutosoftas & Gray 2013). Authors Koutsoftas and 
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Gray (2013) examined dimensions of writing within and across phases of writing through 
structural equation modeling for 267 typically developing students in 6" grade. Results 
supported a linear writing process of planning, translating ideas, and revising and the 
conceptualization of multiple factors to translating ideas including productivity, complexity, 
accuracy and mechanics. Although writing components of interest vary across studies, to 
investigate Matthew Effects we focus on several microstructural aspects of writing including 
lexical measures (e.g., diversity and productivity) and accuracy (e.g., spelling and grammar). 

Among common indices of writing, lexical diversity and productivity are widely reported 
in the literature. Lexical productivity, generally measured by total number of words (TNW), has 
been used as a standard measure of fluency and productivity in curriculum based writing 
measures for decades (Martson, 1989) with reported correlations with the Test of Oral and 
Written Language as high as .84 (e.g., Deno, Marston, & Mirkin, 1982). Other studies have 
found weaker relationships depending on the assessment measure compared (e.g., Gansle, Noell, 
VanDerHeyden, Naquin, & Slider, 2002). Similarly, lexical diversity, or number of different 
words (NDW) is highly correlated to productivity (authors, 2018) , and widely utilized in 
previous studies (Fey, Catts, Proctor- Williams, Tomblin, Zhang, 2004; Wagner et al., 2011) asa 
measure sensitive to developmental change. Additionally, previous studies have shown that 
greater diversity of word use is correlated with language proficiency levels (Grant & Ginther, 
2000; Jarvis, 2002; Yu, 2009). 

The rationale for attending to the lexical measures in the exploration of Matthew effects 
is multifaceted. First, lexical count measures offer potential sensitivity to Matthew effects given 
that TNW is not inhibited by ceiling effects and demonstrates a developmental progression in 


school age students (Fey et al., 2004; author et al., 2018). Further, the meaningfulness of writing 
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productivity is supported by significant relationships with standardized measures of vocabulary 
knowledge (Grant & Ginther, 2000; Miller, Andriacchi, & Nockerts, 2015; author et al., 2018) 
and relationships to writing quality for children in the elementary grades (e.g., Abbott & 
Berninger, 1993; Graham, Berninger, Abbott, Abbott, & Whitaker, 1997; Wagner et al., 2011). 

Among other measures considered in the constellation of writing, accuracy is generally 
included (Goertz et al., 2001; Koutosoftas & Gray 2013; Wagner et al., 2011). The consideration 
of spelling accuracy, specifically, is supported by the wide recognition of spelling as an essential 
component of writing (Berninger, Abbott, Nagy, & Carlisle, 2010; Devonshire & Fluck, 2010). 
Furthermore, spelling skills have been reported to predict text composition in students in first 
through seventh grade (Abbot, Berninger, & Fayol, 2010), are sensitive to developmental change 
in writing (Dockrell, Connelly, Walter, & Critten, 2015) and differentiate children with language 
learning difficulties (Broc, Bernicot, Olive, Favart, Reilly, Quémart, & Uzé, 2013). 

In addition to spelling accuracy, grammatical accuracy is often considered in measuring 
students’ writing. A number of previous findings support that measures of correct writing 
sequences are sensitive to student progress over time (Dockrell et al., 2015; Malecki & Jewell, 
2003). Further, grammaticality, or proportion of grammatical errors, has been found to be 
sensitive to achievement differences and differentiates between children who are typically 
developing and children with language impairments (Eisenberg, & Guo, 2013; Scott & Windsor, 
2000). In a study by Scott and Windsor (2000), the extent of grammatical error was the only 
measure that distinguished children with language learning disabilities from their peers. 

Previous studies have coded accuracy in a number of different ways, including categories 
of spelling errors (e.g., Bahr, Silliman, Berninger, & Dow, 2012; Quick & Erickson, 2018; 


Masterson & Apel, 2013) and categories of grammaticality (Eisenberg & Guo, 2013). Other 
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studies have reported proportion of spelling errors (Dockrell et al., 2015). Similarly, 
grammaticality measures are often calculated using an error ratio (Eisenberg & Guo, 2013; Scott 
& Windsor, 2000) in which the number of grammar errors is measured in relation to total 
number of words. 
Detecting Matthew Effects in Writing 

Given that detection of Matthew Effects requires the use of measures that are sensitive to 
change across the school year and the use of measures and have an indefinite range to minimize 
constraints of potential ceiling effects, we focus in the current study on examination of Matthew 
effects in lexical productivity and accuracy. Evidence for the developmental sensitivity of lexical 
productivity measures of writing is provided by findings of previous studies that have examined 
average change in written language (e.g., authors, 2017). In one such study (Malecki & Jewell, 
2003), investigators examined writing production on a three- minute writing task for 946 
students in first- through eighth-grade administered at two time points (fall and spring). Students 
demonstrated significant increases in writing production and improvements on accuracy 
production indices from fall to spring time points, supporting the utility of such writing measures 
for being sensitive to change in writing across a school year. Similarly, in a study by Dockrell et 
al (2015), investigators examined written texts of 192 students who were in 3", 4", and 5" grade. 
The authors reported significant effects of time across a 5-month period for total words produced 
in students’ expository and narrative writing samples. Finally, similar findings were reported for 
students in early elementary grades in a study of average one year change in productivity and 
lexical diversity between subsequent grade levels for 749 children in first through eighth grade 
(author et al., 2018). Findings indicated that lexical productivity in written narratives was 


sensitive to one-year developmental change for students in first through third grade. These 
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previous findings support the expectation of growth in writing productivity and accuracy across 
the school year (e.g., Dockrell et al., 2015; Malecki & Jewell, 2003; Walter, & Critten, 2015; 
authors, 2018) but none that have considered Matthew effects to our knowledge. 
Influencing Factors 

In a study of Matthew effects in reading, by Morgan, Farkas, and Hibel (2008), the 
authors examined additional child- and family-level factors associated with differing availability 
of resources known to support literacy development (e.g., access to print at home). The authors 
considered gender, race/ethnicity, and social class background which have been established as 
predictors of reading growth rate (McCoach, O’Connell, Reis, & Levitt, 2006) and access to 
books and literacy related resources at home (Dickinson, McCabe, & 
Anastasopoulos, 2002). Growth slopes in reading were significantly lower for males and 
students from minority or low SES backgrounds, suggesting Matthew effects may be influenced 
by these factors. In contrast, students from high SES backgrounds and majority race/ethnicity 
backgrounds maintained their relative rankings but did not demonstrate fan spread effects. Based 
on the overlap in underlying skills discussed previously, it is possible child or classroom-level 
factors may influence writing growth; however this has not been examined. Additional research 
is needed to examine typical growth in microstructural measures of writing, explore differences 
in growth between students, and, if present, investigate potential predictors of fan spread effects 
in writing productivity. 
Research Aims 

The importance of monitoring language and literacy skill development is undisputed, and 
the value of writing skills for academic achievement is widely recognized. Despite this 


recognized importance and the emphasis on writing in the academic standards, there has been 
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less attention in the research on writing development compared to reading or oral language 
(Miller & McCardle, 2011). Additional research is warranted to add to our understanding of 
typical expected growth in written narratives in relation to standardized oral language measures 
and examine potential Matthew effects. Moreover, additional research is needed measuring 
change in writing performance to improve our understanding of the relationships between written 
narrative measures and standardized assessments across the school year. Perhaps most 
compelling, is the need to examine change in writing skills for fan spread growth and identify 
which students may be at increased risk for widening achievement gaps over time. In response, 
the current study was designed to address the research questions: 

1) What is the average change in written narrative performance from beginning to the 
end of the school year for second grade students? 

2) Are there differences in patterns of individual growth in written narrative outcomes 
for students in second grade? Specifically, is there evidence of Matthew effects in 
individual lexical growth patterns in writing? 

3) What are potential predictors of the Matthew effects in students’ writing measures 
(e.g., school, teacher, gender, free/reduced lunch, race/ethnicity, and language of the 


home)? 


METHOD 


Data for the current project were collected as part of a package of assessment measures 
administered in a larger grant funded by the Institute of Education Sciences, U. S. Department of 
Education. The study procedures were reviewed and approved by two universities’ committees 
on research involving human subjects (HSC#:212777). The current project used extant data 


from one year of the funded project with the 13 participating elementary schools. The larger 
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project examined teachers’ language but did not explicitly teach narratives or include structured 
or unstructured writing activities between fall and spring time points as part of the study. 
Participants 

The sample for this study included children from thirteen schools, in 33 classrooms in 
urban and rural areas of northern Florida and north central Tennessee. Twelve students were 
randomly selected for assessment from consented students in each class. The data set for the 
current paper included 299 participants in second grade from the larger study who had complete 
data for the written narrative sample at both fall and spring time points. The sample was 
comprised of 159 girls (52%) and 143 boys (47%) with an average age of 7 years and 6 months 
old (SD = 0.37 years). Of the 299 participating children, 58% were reported as White, 20% 
African American, 14% Hispanic, 5% Asian, and 2% mixed race. Additionally, a small 
percentage of the students (16%) were exposed to another language at home by one or more 
caregivers who spoke Spanish (n = 32), Arabic (n = 4), Mandarain (m =2) Amharic ( = 2), 
Korean (” = 1), and Dinka (” = 1). The percent of students receiving free or reduced lunch in the 
sample was 36%, with 108 receiving free or reduced lunch, two with missing data, and 189 
students (63%) not eligible. 

The investigators administered assessments of global language performance to describe 
students’ general language skills, allow for considerations of generalizeability, and to further 
verify that students were considered to have typically developing oral language skills. 
Performance on the Clinical Evaluation of Language Fundamentals- Fifth Edition (CELF-5; 
Wiig, Secord, & Semel, 2013) was used to evaluate if the language skills were within normal 
range. Scores on language and literacy assessments were not used for inclusionary or 


exclusionary decisions but for descriptive purposes. Students demonstrated an overall mean 
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performance on the core language score of the CELF-5 of 101.35 (SD = 15.87) which indicated 


they were within expected average range. No one was excluded based on his or her performance. 


Materials for Assessment of Change across the School Year 


Lexical Diversity and Productivity of Written Narratives. In the current study, 
number of different words (NDW) and total number of words (TNW) were calculated as a 
standard output measure of vocabulary using Systematic Analysis of Language Transcripts 
(Miller, 2011). NDW and TNW were considered to be advantageous for measuring Matthew 
effects because they were not constrained by ceiling effects. Additionally NDW and TNW are 
among the most frequently used metrics of vocabulary in writing samples (Danzak, 2011; Hall- 
Mills & Apel, 2013; Koutsoftas & Gray, 2013; Price & Jackson, 2015). Lexical productivity and 
diversity or the “range of vocabulary in a language sample” (Malvern, Richards, Chipere, & 
Duran, 2004, p.16) are commonly used as indicators of compositional productivity in writing 
(e.g., Abbott & Berninger, 1993; Berman & Verhoeven, 2002; Mackie & Dockrell, 2004; 
Puranik, Lombardino, Altman, 2008; Scott & Windsor, 2000; Wagner et al., 2011). 

Accuracy of Written Narratives. Two measures of writing accuracy were included as 
additional language measures in the current study. The specific types of errors were not the 
focus of the current study. As such, a broad measure of accuracy based on the proportion of 
spelling and grammar errors was utilized. Investigators calculated proportion of errors within 
two broad categories of error types, including errors of spelling and errors of grammar. 
Calculation of errors as a ratio of errors to total number of words was consistent with measures 
established in previous studies (Eisenberg & Guo, 2013; Scott & Windsor, 2000). 


Standardized Measures of Language 
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Given that children draw on language abilities when composing written narratives (Kim, 
2016; Kim & Schatschneider, 2016; Kim et al., 2015), we have included performance on 
assessments of general language performance in the fall and spring. Three measures were 
utilized to describe participants’ language skills and reported in Table 1. These include 
standardized measures of expressive vocabulary, receptive vocabulary knowledge, and 
formulating oral sentences. Each measure is described below. 

Expressive Vocabulary. We assessed children’s oral expressive vocabulary using the 
Expressive Vocabulary Test-Second Edition (EVT-2; Williams, 2007) in September using Form 
A and again in April using Form B. The EVT-2 is an individually administered, norm-referenced 
test assessing expressive vocabulary and word retrieval. The EVT-2 includes practice/example 
items and 190 test items arranged in increasing difficulty. For each item, the examiner presents a 
picture and reads a question intended to elicit a single word response. Correct responses require 
the child to label (e.g., what shape is this?) or to provide a synonym for a word appropriate to the 
image. Items include different parts of speech, home and school vocabulary, and different levels 
of specificity (e.g., tier 2 and tier 3 words). The test is untimed and takes approximately 15 
minutes to establish the basal and ceiling (5 consecutive incorrect responses). According to the 
manual, the test-retest reliabilities yielded correlations between .94 and .97. Internal consistency 
for each form is reported with high split-half reliability of .94 for Form A and .93 for Form B. 
The reliability between Form A and Form B is reported to be .83-.91. 

Receptive vocabulary. We assessed receptive English vocabulary based on children’s 
recognition of spoken words on the Peabody Picture Vocabulary Test-IV (PPVT- IV; Dunn & 
Dunn, 2007). The test provides an array of four color pictures for each vocabulary item. The 


examiner asks the child to point to the picture that matches the spoken word from a four-picture 
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array. The child’s response is scored dichotomously, as correct or incorrect. The items are 
arranged in sets of 10 items that are intended to become increasingly difficult. A basal is 
established (a set containing one or no errors) and the child continues until the ceiling of eight or 
more etrors in a set is reached. The PPVT- IV is an untimed test normed through a sample of 
3,540 participants for use with individuals 2 to 90 years old. Split half reliability by age for 
Form A and Form B was M = .94 (SD = 3.6), and range from .90-.97 for ages 5-11. 

Formulation of Spoken Sentences. To further describe children’s language skills in 
relation to their growth in written language, we administered the Formulating Sentences subtest 
of the Clinical Evaluation of Language Fundamentals- Fifth Edition (CELF-5; Wiig, Secord, & 
Semel, 2013) in the fall and spring of the school year. In the Formulating Sentences task, the 
examiner presents a word and asks the child to construct a grammatical sentence using the word. 
The subtest includes words that result in syntactically complex sentences. 
Procedures 

Standardized Oral Language Measures. The standardized measures of language were 
individually administered in random order across two testing sessions of 30-45 minutes each. All 
examiners had completed training on the administration procedure and met a researcher made 
proficiency criterion to ensure fidelity of implementation using the standardized test 
administration procedures and protocols. Completed test protocols were double scored to ensure 
accuracy in calculation of scores following rules for basals and ceilings. 

Writing Task. Investigators administered the writing task as a large group (whole 
classroom) during the first seven weeks of instruction and the last seven weeks of the school 
year. For the written personal narrative task, the same prompt was administered in both the fall 


and spring time points. The typed prompt: One day when I got home from school...was provided 
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at the top of a double-sided lined piece of paper, which was similar to written prompts used in 
other studies (Connelly, Dockrell, Walter, & Critten, 2012; Dockrell, Ricketts, Charman, & 
Lindsay, 2014; McMaster & Espin, 2007; author et al., 2017). A research assistant read the 
instructions aloud to the class, informing students that they had 10 minutes to write a response. 


Specifically, the directions stated, 


Do your best writing and please write neatly so we can read it later. Now you are going to 
write a story. Iam going to read a sentence to you first, and then I want you to write a story 
about what happens. You will have 10 minutes to write your story. Do your best work. If you 
don’t know how to spell a word, you should do your best and keep writing. You are going to 


write a story that begins, “One day when I got home from school...” 


Transcription. Research assistants trained in the university’s speech-language pathology 
program transcribed the written samples following traditional procedures in accordance with 
conventions established for Systematic Analysis of Language Transcripts (SALT) (Miller & 
Iglesias, 2010). Lexical items in the students’ writing samples included recognizable real words 
regardless of spelling errors. In the case that a written word was deemed illegible, it was not 
included in the analysis for number of different words but instead coded as illegible and not 
included in the number of different words. NDW and NTW were derived by generating standard 
measures reports in SALT. 

Coding. Children’s use of capitalization, misspellings, and punctuation was maintained. 
Research assistants inserted codes proximal to each deviation from Standard English that 
represented spelling or grammar errors. Examples of commonly occurring grammatical errors 
include the omission of past tense, omission of conjunctions and possessive markers, lack of 


verb-tense agreement, lack of singular and plural subject-verb agreement markers. Spelling 
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errors included any deviation from Standard English spelling rules. Coding rules were 
established to operationalize the assignment of error codes. For example, if the possessive 
marker was present but the apostrophe was missing (e.g., principals office), it was considered a 
punctuation error and not marked as a grammar error. In contrast, if the possessive marker was 
not present (e.g., principal office) is was counted as a grammatical error. However, one lexical 
item was allowed to have more than one error attributed. For example, if cousin was mis-spelled 
and lacking a possessive marker (e.g., we played my cuzin game), the spelling error and 
grammar error codes were both assigned. 

Research assistants trained on identifying errors entered the error codes as described 
above. The SALT software was also utilized to aggregate the occurrence of each type of error 
code. The first author reviewed one of every ten transcripts to double score the error codes. It 
was expected that coding errors would occur at a rate of between 3%-6% during sample analysis 
based on what is commonly reported in the literature (e.g., Fey et al., 2004; Gillam & Johnston, 
1992; Windsor, Scott & Street, 2000). Agreement was 97% for spelling and grammar errors, not 
including formatting errors. Any disagreements in error assignments were discussed to resolve 
errors as coding continued. Due to the high rate of agreement, further double scoring of spelling 
and grammar was determined to be unnecessary. 

Analyses 

For the first research question, descriptive statistics were examined for microstructural 
measures of writing at fall and spring time points. We used a mixed model to examine for 
significant effects of time using the Ime4 package in R (Bates, Maechler, Bolker, & Walker, 
2015). For the second research question we examined individual patterns of growth to test for 


Matthew effects. Given only two time points we were limited in the types of analyses that could 
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be used. More sophisticated methods such as quasi-simplex modeling or latent growth curves 
could not be utilized due to the availability of two time points only. However, with only two- 
points there are a number of analyses that could be used to examine the potential differences of 
Matthew effects. Bast and Reitsma (1997) discussed the properties that should be present if a 
Matthew Effect were operating. First, variance over time should be increasing along with the 
presence of a strong correlations over time. Secondly, we argue that additional evidence for a 
Matthew effect would be obtained if an initial score on an assessment was positively related to 
the change from initial score to final score. That is, we operationally defined a Matthew effect as 
a significant and positive relationships of initial status with change. With these criterion in mind, 
we tested for Matthew effects by examining the fall to spring correlations as well as changes in 
variance in lexical productivity from fall to spring on the writing samples of the students in 
second grade. Additionally, we fit a model that predicted the change score from initial status 
while taking into account the nested structure of the data. Finally, for our third research 
question, we examined potential predictors of Matthew effects (initial writing productivity, 
school, teacher, gender, free/reduced lunch eligibility and language of home) using a two-level 
hierarchical linear model with students nested within teachers and schools. 


RESULTS 


Descriptive Statistics 


Our first research question was to examine the average change in written narrative 
performance from beginning to the end of the school year for second grade students. To address 
this question, we first report descriptive statistics at the beginning and end of the school year. 
Means and standard deviations for narrative measures of the full sample are provided in Table 1. 


For the total number of words, the students averaged 55 total words in the fall and averaged 69 
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words in the spring, with a statistically significant increase of 14 words ¢ (298.0) = 8.32, 
p<.0001, d= 0.47. For number of different words, the students averaged 33 unique words in the 
fall and 41 unique words in the spring. This change was also significant ¢ (298.0) = 9.17, 
p<.0001, d= 0.53. For proportion of spelling errors, there was a significant decrease from fall to 
spring time points ¢ (280.6) = 7.38, p < .0001, d= 0.26. The ratio of spelling errors decreased 
from 15% to 12% on average, relative to the total number of words. There was no significant 
change in proportion of grammatical errors from fall to spring in the current study ¢ (282.9) = 


1.32, p = .190. 


[insert Table 1] 


Our second research question was to examine individual patterns of growth across the 
school year for students in second grade. To that end, we examined changes in variance in TNW 
and for Matthew effects. Because NDW and TNW were highly correlated, we did not examine 
NDW separately. For TNW, the variance from fall to spring increased from Var=791 to 
Var=1005, which was a significant increase (Grambsch Variance Test, Z= -2.2668, p = 0.0234). 
Additionally, the correlation of TNW from fall to spring was r=.53. Further, there was a 
significant positive correlation between initial (Fall) TNW and the change in TNW (7 = .39). 
Lastly we tested for a relationship between pretest scores and the change between pretest to 
posttest performance. The correlation between Fall TNW and growth (change in TNW from fall 
to spring) was 0.39. Finally, we fit an HLM model where the TNW change score (posttest minus 
pretest) was the dependent variable and initial status on TNW was the predictor variable. 
Initially we also modeled the classroom and school level variance on the change score, but there 
was no variance at the school level, so that random effect was dropped. The final model 


demonstrated that there was a significant and positive relationship between initial level of TNW 
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and change (estimate = 11.8, (249.4) = 7.23, p <.0001). The random effect of classroom was 
also significant (variance = 47.3, X*(1)= 5.00, p =.025). The student-within-classroom variance 
(residual) was estimated to be 679.5, with a classroom ICC of 47.3/679.5=7.0%. This implies 
that the Matthew effect varied by classroom. All together, these pieces of evidence suggest the 
existence of a Matthew effect for writing as measured by TNW. The distributions of TNW at fall 
and spring are displayed in the violin plot in Figure 1. A violin plot closely resembles a box plot, 


but the sides of the plot represent the density of the distribution. 
[insert Figure 1] 


Our third research question examined potential predictors of the Matthew Effects. We 
examined potential predictors of the amount of change in TNW nested within classroom. After 
taking into account initial TNW, we examined gender, race/ethnicity, eligibility for free or 
reduced lunch, and language of the home as potential moderators of growth rates. We 
operationally defined a moderator of a Matthew effect as the presence of a significant interaction 
of initial status and the potential moderator. As displayed in Table 2, free-reduced lunch 
eligibility was a marginally significant moderator of Matthew effects. Gender, race/ethnicity, 
language of the home were not significant predictors above and beyond initial performance level 
in written lexical productivity (TNW). 

DISCUSSION 
Key Findings 

The primary purpose of this study was two-fold: a) to describe the written narrative 
performance of second grade students at the beginning and end of the school year and b) to test 
for Matthew effects in lexical productivity and examine predictors of change. Among key 


findings, students demonstrated significant increases in lexical diversity, productivity, and 
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proportion of accurately spelled words across the school year. Students demonstrated an 
increase of 14 total words on average. Additionally, the variance in lexical productivity (TNW) 
from fall to spring showed significant increases providing evidence for the existence of a 
Matthew effect for writing productivity as measured by total number of words. Further, the 
results suggest that Matthew effects are predicted by teacher or classroom-level factors and 
socioeconomic status as measured by eligibility for free or reduced lunch. 

The finding that microstructural writing measures (lexical diversity, productivity, spelling 
accuracy) were sensitive to change across the school year is consistent with previous reports in 
the literature (Dockrell et al., 2015; Fey et al., 2004; Malecki & Jewell, 2003, Wagner et al., 
2011) and substantiates the utility of short-duration writing samples for progress monitoring. 
The sensitivity of TNW in the current study was similar to the significant differences in TNW 
between fall and spring reported by Malecki and Jewell (2003) and growth across a 5-month 
period for total words as reported by Dockrell and colleagues (2015) for students in 3"! — 5" 
grade. Additionally, the significant change in lexical diversity across the school year 
substantiates expected gains in lexical diversity as reported by Fey and colleagues (2004) from 
24 to 4" grade. Further, the finding that spelling accuracy in writing was sensitive to 
developmental change in writing is consistent with previous studies (Dockrell et al., 2015). 

In contrast, the lack of change in the rate of grammatical errors was surprising, but may 
have been influenced by the fact that errors in the current study were coded broadly without 
categorization of error types which may have decreased sensitivity to change in severity of error. 
Overall, findings substantiate that writing productivity (TNW), lexical diversity (NDW), and 
proportion of spelling errors are sensitive to developmental change in writing for students in 


second grade. 
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The unique contribution of the current results is the evidence for the existence of 
Matthew effects in writing productivity. In the current findings, students who had better written 
lexical productivity (TNW) at the beginning of the school year showed greater growth across the 
school year than students who entered second grade with poor written language productivity. 
This finding is consistent with the notion of Matthew effects described in other domains in which 
the rich get richer and the poor get poorer (Cook & Campbell, 1979; Merton, 1995; Rigney 
2010). To our knowledge, few if any studies have examined Matthew effects in written language. 
Writing productivity may be particularly sensitive to such effects given that it is not constrained 
by ceiling effects and writing productivity measures tend to show the largest magnitude of 
change during early elementary grades, more so than lexical diversity (Wagner et al., 20011). 

The finding that teacher/class and socioeconomic status were significant predictors of 
Matthew effects, appears to be aligned with results reported for Matthew effects in reading 
(Morgan et al., 2008) which substantiates that some children are more at-risk for Matthew effects 
than others. The fact that teachers or classroom-level factors explained 7% of the variance in 
growth highlights the malleability of early writing and the important influence of the 
environment on writing. 

In light of the significant role of environmental factors in predicting writing growth, the 
current findings lend support for models that emphasize the interaction of environmental factors 
with individual characteristics and capacity, such as the Revised Writer(s)-Within-Community 
(WWC) Model of Writing (Graham, 2018). Although children’s initial lexical productivity 
influenced spring performance, consistent with the Lexical Quality Hypothesis; the current 
findings suggest that teachers or classroom-level factors and socioeconomic resources in the 


child’s environment also underpin children’s writing growth. Such predictors seem well aligned 
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to the WWC Model of writing which proposes that writing is simultaneously shaped by the home 
and school communities in which writing takes place in addition to individual cognitive and 
linguistic capacity. 

The presence of Matthew effects in written productivity underscores the importance of 
expanding writing instructional support for students in elementary school with low initial 
performance on writing. Further, the finding that teacher/classroom and socioeconomic level 
predicted Matthew effects suggests that certain groups of children may be more likely to require 
additional writing supports to prevent achievement gaps from widening. Although it is 
impossible to identify aspects of classrooms or teachers that influenced writing growth, the 
current findings affirm that classroom features matter for early writing growth. Previous work in 
literacy has noted the effects of quality of instruction, classroom resources, peer-to-peer support, 
and time spent in literacy on student literacy outcomes (Cunningham & Stanovich, 1997; 
Guthrie, Wigfield, Metsala, & Cox, 1999). By better understanding Matthew effects it is hoped 
that we can identify effective ways to prevent the effects or neutralize opportunity imbalances 


such as bolstering writing instruction and support to those students at risk. 


Limitations 

The length of time studied is notably narrow, in examining growth from fall to spring 
within the same school year. Given only two time points we were limited in the types of analyses 
that could be used. As such, preferred methods such as quasi-simplex modeling or latent growth 
curves could not be utilized. Although our primary research aim was to isolate and describe 
change within second grade, additional longitudinal studies of successive school years are 
needed to more fully describe development and add to the knowledge base on written language 


development across the elementary school years. Similarly, although we focused on component 
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skills in the current study, in future studies it would be interesting to examine students’ average 
change in measures that assess more depth in vocabulary, rather than just breadth, and measures 
that reflect more quality ratings or holistic aspects with additional focus on development of 
ideas, cohesiveness, and organization. Although not available in the current study, it would be 
interesting to examine macrostructural measures such as the six- trait writing rubric (STWR, 
Education Northwest, 2006) or similar Likert type scales of quality (Dockrell, Ricketts, Charman 
& Lindsay, 2014; Koustsoftas, 2016; Wechsler, 2005). 

The use of only one sample at each time point was a noted limitation. Because the 
prompt was the same at each time point, students may demonstrate change simply because they 
have thought about the topic previously, and not because their underlying written language skills 
have improved. Although this is a limitation, it is often common practice to administer the same 
prompt to keep the measure similar across two points (Abbot et al., 2010; Dockrell et al., 2015; 
Juel, 1988; authors, 2018). Additionally, given a single writing task at each time point, it cannot 
be assumed that the sample is representative to their typical work or that similar results would be 
observed using other types of prompts (e.g., persuasive or explanatory). Other authors (e.g., 
Olinghouse & Leaird, 2009) have reported vocabulary diversity has been shown to remain stable 
across two different writing tasks, but other measures from children’s writing samples vary 
across writing tasks. In a future study it would be interesting to examine differences in other 
types of samples (e.g., persuasive and explanatory) and their use as progress monitoring tools for 
young school age students. 

Another limitation for consideration is the lack of available assessment information on 
other skills that could be related. For example, in the current study we did not have access to 


information about the students’ verbal reasoning skills and working memory which have been 
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shown to relate to reading and writing achievement in previous studies (Berninger & Richards, 
2010; Prifitera, Weiss, Saklofske, & Rolfhus, 2005). Given that prior evidence supports a 
relationship between verbal reasoning and reading and writing achievement (e.g., Prifitera et al., 
2005) and relationship between verbal working memory and achievement (e.g., Berninger & 
Richards, 2010), we cannot rule out that differences in average change over the school year are 
in part related to students’ verbal reasoning and verbal working memory. In future studies, it 
would be interesting to consider other measures of language performance to further explore 
factors that predict the developmental trajectory of writing skills. Although exploring verbal 
reasoning and memory skills was not an aim of the current study, it would be beneficial for 
future studies to include such factors as potential moderators of writing performance and growth. 
Implications and Suggestions for Future Studies 

Despite limitations of the study, the findings substantiate the sensitivity of written 
personal narrative measures to developmental changes in children’s performance across the 
school year. The developmental changes support the usefulness of written narratives for 
progress monitoring the language development of young school age children. The current 
findings also highlight specific components that appear to be malleable across the school year. 
The identification of components that are expected to show growth across the school year may 
be particularly useful for teachers and related personnel in progress monitoring and program 
planning. Further, the important role of initial vocabulary skills on narrative performance adds 
to our understanding of children’s outcomes and predicted performance across the school year. 

The presence of Matthew effects in the writing skills of second grade students spurs 
intellectual curiosities that warrant further study. Additional studies are needed to understand 


why and under what circumstances Matthew effects occur and do not occur in writing skills. The 
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source and nature of inequalities require more scientific inquiry. Additional study of the 
underlying mechanisms or factors sustaining inequalities is also needed (e.g., motivation, writing 
experience, access to print) in order to identify ways to prevent gaps in writing achievement. In 
future studies it would be interesting to explore other potential moderators of the effect, such as 
quantity and quality of language experiences and exposures, duration of writing instruction, 
and/or writing instructional strategies. The presence of gaps in writing achievement warrants 
exploration of innovative ways to minimize achievement gaps and expand writing instructional 


supports for students with low initial performance in writing. 
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Table 1 


Fall and Spring Performance on Writing Measures and Descriptive Standardized Assessments of Language 


Fall Spring 
N M SD N M SD 

Writing Measures 

Dsl 299 33.35 14.65 299 41.37 15.56 

Words 

Number of Total Words 299 55.75 27.95 299 69.70 31.73 

Errors of Spelling 299 0.15 0.12 299 0.12 O11 

Errors of Grammar 298 0.03 0.04 298 0.03 0.04 
Standardized Assessments 

PPVT Raw Score 299 127.60 18.57 299 138.41 17.94 

PPVT Standard Score 299 104.39 13.98 299 105.19 13.59 

EVT Raw Score 299 94.34 16.10 299 102.22 14.98 

EVT Standard Score 299 100.71 13.69 299 101.55 12.93 

CELF-FS Raw Score 299 29.28 8.36 299 31.16 8.20 


Note. FRL refers to eligibility for free or reduced lunch. TNW refers to total number words. Errors of Spelling 
refers to the proportion of spelling errors to total words. Errors of Grammar refers to the proportion of 
grammatical errors to total words. PPVT refers to the Peabody Picture Vocabulary Test-IV (Dunn & Dunn, 
2007). EVT refers to the Expressive Vocabulary Test-2 (Williams, 2007). CELF FS refers to the formulated 
sentences subtest of the Clinical Evaluation of Language Fundamentals-5" Edition (Wiig, Secord & Semel, 
2013). 
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Table 2 


Parameter Estimates Examining Moderators of Matthew Effects 


t-value/Chi 


Model Effect Estimate Squared df p-value 

Gender Initial (Fall TNW) 8.75 3.42 274.4 0.001 
Gender -14.50 -2.08 294.3 0.038 
Initial*Gender 0.194 1.71 293.1 0.089 
Classroom Variance 42.70 4.35 1 0.036 
Residual 676.10 

Race Initial (Fall TNW) 9.23 1.79 266.9 0.075 
Race -0.31 284.9 0.757 
Initial*Race 0.29 275.5 0.772 
Teacher Random 
Effect 44,33 4.06 1 0.044 
Residual 684.33 

Free/Reduced 

Lunch Initial (Fall TNW) 9.58 4.94 237.97 <.0001 
FRL 0.85 0.24 171.06 0.812 
Initial*FRL 7.03 1.92 292.98 0.056 
Teacher Random 
Effect 36.55 2.99 1 0.084 
Residual 679.75 

Language of Home _Initial (Fall TNW) 13.17 7.79 238.81 <.0001 
Language of Home -2.10 -0.41 = 202.82 0.679 
Initial*Lang. of 
Home -9.87 -1.82 280.53 0.070 
Teacher Random 
Effect 41.45 4.04 1 0.044 
Residual 651.68 


Note. FRL refers to eligibility for free or reduced lunch. TNW refers to total number words. There 
is no estimate for race or the interaction of race and initial status because it is a multiparameter 


test. 
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Figure 1. 
Matthew Effects in Students’ Written Productivity as Measured Total Number of Words. 


Note. The figure on the left shows fall lexical productivity as compared to spring on the right. 


